Further experiments on audio-visual speech source separation

نویسندگان

David Sodoyer

Laurent Girin

Christian Jutten

Jean-Luc Schwartz

چکیده

Looking at the speaker’s face seems useful to better hear a speech signal and extract it from competing sources before identification. This might result in elaborating new speech enhancement or extraction techniques exploiting the audio-visual coherence of speech stimuli. In this paper, we present a set of experiments on a novel algorithm plugging audio-visual coherence estimated by statistical tools, on classical blind source separation algorithms. We show in the case of additive mixtures that this algorithm performs better than classical blind tools both when there are as many sensors as sources, and when there are less sensors than sources. Audiovisual coherence enables to focus on the speech source to extract. It may also be used at the output of a classical source separation algorithm, to select the “best” sensor in reference to a target source.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust audio-visual speech synchrony detection by generalized bimodal linear prediction

We study the problem of detecting audio-visual synchrony in video segments containing a speaker in frontal head pose. The problem holds a number of important applications, for example speech source localization, speech activity detection, speaker diarization, speech source separation, and biometric spoofing detection. In particular, we build on earlier work, extending our previously proposed ti...

متن کامل

Extracting an AV speech source f

We present a new approach to the source separation problem for multiple speech signals. Using the extra visual information of the face speaker, the method aims to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speaker’s lip movements. We define a statistical model of the joint probability of visual and spectral audio input for quantifying the ...

متن کامل

Speech extraction based on ICA and audio-visual coherence

We present a new approach to the source separation problem for multiple speech signals. Using the extra visual information of the speaker’s face, the method aims to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speaker’s lip movements. We define a statistical model of the joint probability of visual and spectral audio input for quantifying th...

متن کامل

Audio visual speech source separation via improved context dependent association model

In this paper, we exploit the non-linear relation between a speech source and its associated lip video as a source of extra information to propose an improved audio-visual speech source separation (AVSS) algorithm. The audio-visual association is modeled using a neural associator which estimates the visual lip parameters from a temporal context of acoustic observation frames. We define an objec...

متن کامل

Visual voice activity detection as a help for speech source separation from convolutive mixtures

Audio–visual speech source separation consists in mixing visual speech processing techniques (e.g., lip parameters tracking) with source separation methods to improve the extraction of a speech source of interest from a mixture of acoustic signals. In this paper, we present a new approach that combines visual information with separation methods based on the sparseness of speech: visual informat...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Further experiments on audio-visual speech source separation

نویسندگان

چکیده

منابع مشابه

Robust audio-visual speech synchrony detection by generalized bimodal linear prediction

Extracting an AV speech source f

Speech extraction based on ICA and audio-visual coherence

Audio visual speech source separation via improved context dependent association model

Visual voice activity detection as a help for speech source separation from convolutive mixtures

عنوان ژورنال:

اشتراک گذاری